Computer-Assisted Text Analysis for Social Science: Topic Models and Beyond

نویسنده

Ryan Wesslen

چکیده

Topic models are a family of statistical-based algorithms to summarize, explore and index large collections of text documents. After a decade of research led by computer scientists, topic models have spread to social science as a new generation of data-driven social scientists have searched for tools to explore large collections of unstructured text. Recently, social scientists have contributed to topic model literature with developments in causal inference and tools for handling the problem of multi-modality. In this paper, I provide a literature review on the evolution of topic modeling including extensions for document covariates, methods for evaluation and interpretation, and advances in interactive visualizations along with each aspect’s relevance and application for social science research. Keywords—computational social science, computer-assisted text analysis, visual analytics, structural topic model

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Probabilistic Topic Models

Over the last decade, probabilistic topic models have emerged as an extremely powerful and popular tool for analyzing large collections of unstructured data. While originally proposed for textual data, topic models have since been applied for various other types of data, such as images, videos, music, social networks and biological data. In this tutorial, I will discuss both the modeling and al...

متن کامل

Computer-Assisted Keyword and Document Set Discovery from Unstructured Text

The (unheralded) first step in many applications of automated text analysis involves selecting keywords to choose documents from a large text corpus for further study. Although all substantive results depend crucially on this choice, researchers typically pick keywords in ad hoc ways, given the lack of formal statistical methods to help. Paradoxically, this often means that the validity of the ...

متن کامل

Distill-ery: Iterative Topic Modeling to Improve the Content Analysis Process

Dimensionality reduction techniques such as Latent Dirichlet Allocation (LDA) provide extremely useful ways of making better sense of large, unstructured text data corpora quickly and efficiently. However, beyond small circles typically comprising computer science and statistics experts topic modeling has failed to become a commonplace technique incorporated into mainstream data analysis proces...

متن کامل

Using Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media

Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2018

Computer-Assisted Text Analysis for Social Science: Topic Models and Beyond

نویسنده

چکیده

منابع مشابه

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Probabilistic Topic Models

Computer-Assisted Keyword and Document Set Discovery from Unstructured Text

Distill-ery: Iterative Topic Modeling to Improve the Content Analysis Process

Using Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media

عنوان ژورنال:

اشتراک گذاری